An Efficient Two-Pass Decoder for SMT Using Word Confidence Estimation

نویسندگان

  • Ngoc-Quang Luong
  • Laurent Besacier
  • Benjamin Lecouteux
چکیده

During decoding, the Statistical Machine Translation (SMT) decoder travels over all complete paths on the Search Graph (SG), seeks those with cheapest costs and backtracks to read off the best translations. Although these winners beat the rest in model scores, there is no certain guarantee that they have the highest quality with respect to the human references. This paper exploits Word Confidence Estimation (WCE) scores in the second pass of decoding to enhance the Machine Translation (MT) quality. By using the confidence score of each word in the N-best list to update the cost of SG hypotheses containing it, we hope to “reinforce” or “weaken” them relied on word quality. After the update, new best translations are re-determined using updated costs. In the experiments on our real WCE scores and ideal (oracle) ones, the latter significantly boosts one-pass decoder by 7.87 BLEU points, meanwhile the former yields an improvement of 1.49 points for the same metric.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Two-pass Continuous Digit String Decoder

In this paper, we present a two-pass continuous digit string decoder using two sets of whole-word HMM models. One set contains context-independent (CI) models used in the first-pass search. The first-pass search results in N-best hypotheses from which a N-best word lattice can be derived. The other set contains context-dependent (CD) HMM models used to search along the N-best word lattice for t...

متن کامل

Efficient 2-pass n-best decoder

In this paper, we describe the new BBN BYBLOS efcient 2-Pass N-Best decoder used for the 1996 Hub-4 Benchmark Tests. The decoder uses a quick fastmatch to determine the likely word endings. Then in the second pass, it performs a time-synchronous beam search using a detailed continuous-density HMM and a trigram language model to decide the word starting positions. From these word starts, the dec...

متن کامل

Dependency Treelet Translation: Syntactically Informed Phrasal SMT

We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. This method requires a source-language dependency parser, target language word segmentation and an unsupervised word alignment component. We align a parallel corpus, project the source dependency parse onto the target sentence, e...

متن کامل

Hypergraph Training and Decoding of System Combination in SMT

Tranditional n-best based training and decoding method of system combination can propogate the error because of imprecision parameter estimation and too early prunning. In order to alleviate the problem, the paper proposes hypergraph (HG) based three-pass training and three-pass decoding for different features. In order to construct HG, this paper introduces simplified bracket transduction gram...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014